基于自引导注意力的双模态校准融合目标检测算法

doi:10.16451/j.cnki.issn1003-6059.202309003

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (3392 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要为了解决传统双模态目标检测方法难以在复杂场景(如大雾、眩光、黑夜)中克服低对比度噪声以及无法有效识别小尺寸目标的问题,文中提出基于自引导注意力的双模态校准融合目标检测算法.首先,设计双模态融合网络,利用通道特征和空间特征校准纠正输入图像(可见光图像与红外图像)中的低对比度噪声,从纠正后的特征中获取互补信息,并准确实现特征融合,提高算法在眩光、黑夜和大雾等场景下的检测精度.然后,构建自引导注意力机制,捕捉图像像素之间的依赖关系,增强不同尺度特征的融合能力,提高算法对于小尺寸目标的检测精度.最后,在行人、行人车辆、航拍车辆三类六种数据集上进行的大量实验表明,文中算法检测精度较高.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	张惊雷
	宫文浩
	贾鑫

关键词 ：低对比度噪声, 目标检测, 双模态校准融合, 自引导注意力

Abstract：The traditional dual-modal object detection algorithms struggle to overcome low-contrast noise in complex scenes, such as fog, glare and dark night, and they cannot recognize small-size objects effectively. To solve these problems, an object detection algorithm with dual-modal rectification fusion based on self-guided attention is proposed. Firstly, a dual-modal fusion network is designed to rectify the low-contrast noise in the input images(visible and infrared images) by channel and spatial feature rectification. Consequently, the complementary information is acquired from the rectified features to accurately achieve feature fusion and the detection accuracy of the algorithm in the complex scenes is improved. Secondly, a self-guided attention mechanism is established to learn the dependency among pixels in the images. Thus, the fusion capability of features at different scales and the detection accuracy of the algorithm for small-scale objects are improved. Extensive experiments on six datasets, including pedestrian datasets, pedestrian-vehicle datasets and aerial vehicle datasets, demonstrate the superiority of the proposed approach.

Key words： Low Contrast Noise Object Detection Dual-Modal Rectification Fusion Self-Guided Attention

收稿日期: 2023-07-21

ZTFLH:

TP391.41

基金资助:国家自然科学基金青年项目(No.62302335)资助

通讯作者: 张惊雷,博士,教授,主要研究方向为模式识别、图像处理.E-mail:zhangjinglei@tjut.edu.cn.

作者简介: 宫文浩,硕士研究生,主要研究方向为图像处理、目标检测.E-mail:gwh@stud.tjut.edu.cn. 贾鑫,博士,讲师,主要研究方向为机器学习、图像处理、三维重建.E-mail:tjut_jiaxin@email.tjut.edu.cn.

引用本文:

张惊雷, 宫文浩, 贾鑫. 基于自引导注意力的双模态校准融合目标检测算法[J]. 模式识别与人工智能, 2023, 36(9): 793-805. ZHANG Jinglei, GONG Wenhao, JIA Xin. Object Detection Algorithm with Dual-Modal Rectification Fusion Based on Self-Guided Attention. Pattern Recognition and Artificial Intelligence, 2023, 36(9): 793-805.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202309003 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2023/V36/I9/793

[1] TIAN Z, SHEN C H, CHEN H, et al. FCOS: Fully Convolutional One-Stage Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 9626-9635.
[2] REDMON J, FARHAD A.YOLO9000: Better, Faster, Stronger // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6517-6525.
[3] ZHOU K L, CHEN L S, CAO X.Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 787-803.
[4] KIEU M, BAGDANOV A D, BERTINI M, et al. Task-Conditioned Domain Adaptation for Pedestrian Detection in Thermal Imagery // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 546-562.
[5] ZHANG H, FROMONT E, LEFEVRE S, et al. Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks // Proc of the IEEE International Conference on Image Processing. Washington, USA: IEEE, 2020: 276-280.
[6] ZHANG H, FROMONT E, LEFEVRE S, et al. Guided Attentive Feature Fusion for Multispectral Pedestrian Detection // Proc of the IEEE Winter Conference on Applications of Computer Vision. Wa-shington, USA: IEEE, 2021: 72-80.
[7] ZHANG H, FROMONT E, LEFEVRE S, et al. Deep Active Lear-ning from Multispectral Data Through Cross-Modality Prediction Inconsistency // Proc of the IEEE International Conference on Image Processing. Washington, USA: IEEE, 2021: 449-453.
[8] AN Z J, LIU C L, HAN Y Q.Effectiveness Guided Cross-Modal Information Sharing for Aligned RGB-T Object Detection. IEEE Signal Processing Letters, 2022, 29: 2562-2566.
[9] SUN Y M, CAO B, ZHU P F, et al. Drone-Based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 6700-6713.
[10] YUAN M X, WANG Y Y, WEI X X.Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 509-525.
[11] 孙颖,侯志强,杨晨,等.基于双模态融合网络的目标检测算法.光子学报, 2023, 52(1): 203-215.
(SUN Y, HOU Z Q, YANG C, et al. Object Detection Algorithm Based on Dual-Modal Fusion Network. Acta Photonica Sinica, 2023, 52(1): 203-215.)
[12] ZHANG J Q, LEI J, XIE W Y, et al. SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61. DOI: 10.1109/TGRS.2023.3258666.
[13] FANG Q Y, HAN D P, WANG Z K.Cross-Modality Fusion Transformer for Multispectral Object Detection[C/OL]. [2023-06-23]. https://arxiv.org/abs/2111.00273v.
[14] ZHAO Z X, BAI H W, ZHANG J S, et al. CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-modality Image Fusion // Proc of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 5906-5916.
[15] ZHU Y H, SUN X Y, WANG M, et al. Multi-modal Feature Pyramid Transformer for RGB-Infrared Object Detection. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(9): 9984-9995.
[16] SHAO Y H, HUANG Q M, MEI Y Y, et al. MOD-YOLO: Multispectral Object Detection Based on Transformer Dual-Stream[C/OL]. [2023-06-23]. http://dx.doi.org/10.2139/ssrn.4469854.
[17] BAO C, CAO J, HAO Q, et al. Dual-YOLO Architecture from Infrared and Visible Images for Object Detection. Sensors, 2023, 23. DOI: 10.33901S3062934.
[18] FU H L, WANG S X, DUAN P H, et al. LRAF-Net: Long-Range Attention Fusion Network for Visible-Infrared Object Detection. IEEE Transactions on Neural Networks and Learning Systems, 2023. DOI: 10.1109/TNNLS.2023.3266452.
[19] LIU Z, LIN Y Y, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 9992-10002.
[20] TAN M X, PANG R M, LE Q V.EfficientDet: Scalable and Efficient Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 10778-10787.
[21] ZHANG J M, LIU H Y, YANG K L, et al. CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers. IEEE Transactions on Intelligent Transportation Systems, 2023. DOI: 10.1109/TITS.2023.3300537.
[22] VADIDAR M, KARIMINEZHAD A, MAYR C, et al. Robust Environment Perception for Automated Driving: A Unified Learning Pipeline for Visual-Infrared Object Detection // Proc of the IEEE Intelligent Vehicles Symposium. Washington, USA: IEEE, 2022:367-374.
[23] CHOI Y, KIM N, HWANG S, et al. KAIST Multi-spectral Day/Night Data Set for Autonomous and Assisted Driving. IEEE Tran-sactions on Intelligent Transportation Systems, 2018, 19(3): 934-948.
[24] JIA X Y, ZHU C, LI M Z, et al. LLVIP: A Visible-Infrared Paired Dataset for Low-Light Vision// Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 3489-3497.
[25] RAZAKARIVONY S, JURIE F.Vehicle Detection in Aerial Ima-gery: A Small Target Detection Benchmark. Journal of Visual Communication and Image Representation, 2016, 34: 187-203.
[26] ZHANG L, LIU Z Y, ZHU X Y, et al. Weakly Aligned Feature Fusion for Multimodal Object Detection. IEEE Transactions on Neu-ral Networks and Learning Systems, 2021. DOI: 10.1109/TNNLS.2021.3105143.